*** OUT OF DATE ***

We just released a new version of metacoder to CRAN with many changes. We will update this website as soon as possible. In the mean time, look at the intoduction vignette/readme here: https://github.com/grunwaldlab/metacoder

Introduction

There is an abundance of publically available taxonomic information, including:

  • Sequence or abundance data with associated classifications from experiments (e.g. Genbank accessions)
  • Reference sequence databases (e.g. Greengenes, UNITE, RDP, SILVA)
  • Taxonomy databases (e.g. NCBI taxonomy, ITIS)
  • Species occuarance databases (e.g. GBIF)

However, these data sources tend to have different ways of specifying taxonomic classifications, including:

  • Full classification with taxon names (e.g. Bacteria;Proteobacteria;Gammaproteobacteria; Enterobacterales;Enterobacteriaceae;Escherichia;coli)
  • Taxon ID from a reference database (e.g. NCBI taxon ID 562)
  • Taxon name (e.g. Escherichia coli)
  • Sequence/Observation ID (e.g. Genbank accession BA000007)

In addition, the same method of defining classifications could be encoded with different formats. For example, all these formats are from refernece databases that supply classifications using a series of taxon names:

  • RDP:
S001191995 uncultured archaeon; LCDARCH35   Lineage=Root;rootrank;Archaea;domain;"Euryarchaeota";phylum;"Methanomicrobia";class;Methanosarcinales;order;untaxmap_Methanosarcinales;
  • SILVA:
AB001438.1.1662 Eukaryota;SAR;Alveolata;Dinoflagellata;Dinophyceae;Gymnodiniphycidae;Kareniaceae;Gyrodinium sp.
  • UNITE:
Lachnum_sp|JQ347180|SH189775.06FU|reps|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnum;s__Lachnum_sp

All this diversity can make parsing data into a common form difficult.

Using extract_taxonomy

Rather than exacerbating the syntactic pollution with another custom format or creating a parser for every database-specific format, the function extract_taxonomy parses taxonomy information from arbitrary characters strings (e.g. sequence headers) identified by a regex expression. Although the motivation for creating extract_taxonomy is to parse FASTA sequence headers, the function is not specific to FASTA or even to sequence information, so I will use the word “observation” instead of “sequence header” from now on.

extract_taxonomy can get taxonomic classfiacitons from any of the previously mentioned format types:

  • Full classification with taxon names (e.g. Bacteria;Proteobacteria;Gammaproteobacteria; Enterobacterales;Enterobacteriaceae;Escherichia;coli)
  • Taxon ID from a reference database (e.g. NCBI taxon ID 562)
  • Taxon name (e.g. Escherichia coli)
  • Sequence/Observation ID (e.g. Genbank accession BA000007)

For all but a full classifcaiton with taxon names (e.g. Bacteria;Proteobacteria;Gammaproteobacteria; Enterobacterales;Enterobacteriaceae;Escherichia;coli), extract_taxonomy will query online databases to get the classificaiton. Note that this usually invloves quierying NCBI, which is often quite slow, so it it best to use data that has a full classifcaiton with taxon names when possible. At the minimum, the output of extract_taxonomy consists of unique taxon identifiers, and the tree structure of the taxonomy shared by all sequences.

NOTE: extract_taxonomy has been designed principally for parsing character vectors, like FASTA headers, rather than tabular data, like the abundance matricies produced by QIIME and mothur. There is another function in development called parse_taxonomy_table intended to parse this kind of data, but it is not quite mature yet, so it will not be discussed for now. See ?parse_taxonomy_table for more information.

Input

There are three important arguments that will usually be relevant: regex, key, and database.

  • The regex argument defines the structure of the strings and the location of relevant information (e.g. GenBank accession numbers) via regex capture groups (i.e. pairs of unescaped parentheses).
  • The key argument defines what kind of information is in each capture group, determining how it will be parsed. The elements of key has a defined set of possible values; see the extract_taxonomy help page for options.
  • The database argument determines the online taxonomy database that will be queried when necessary. Usually, this is ncbi, but others databases are possible (although not as well tested yet); see the extract_taxonomy help page for options.

Other arguments are important in special cases. Some will be explained in the examples below, but for more details see ?extract_taxonomy.

Output

An object of type taxmap is returned. See ?taxmap for more details.

Parsing FASTA files

FASTA files are one of the most common formats that sometimes contain taxonomic data. The read.FASTA function from the ape (Paradis, Claude, and Strimmer 2004) package is commonly used to parse FASTA files in R and extract_taxonomy recognizes its output. When the output of ape::read.FASTA is given to extract_taxonomy, the headers are parsed and the sequences are saved in the output object.

Genbank FASTA headers

FASTA files downloaded from GenBank custom queries contain the GenBank id and the accession number/version to identify sequences. Taxonomic information can be retrieved using either of these identifiers. The following shows how to extract the GI numbers, accession id, and description from the headers. The GenBank accession number is being used to look up the taxonomy information (hence "obs_id" in key option), while the GI numbers and description are being returned without contributing taxonomic information (hence "obs_info" in key option).

library(metacoder)
file_path <- system.file("extdata", "ncbi_basidiomycetes.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
genbank_ex_data <- extract_taxonomy(sequences,
                                    regex = "^.*\\|(.*)\\|.*\\|(.*)\\|(.*)$",
                                    key = c(gi_no = "obs_info", "obs_id", desc = "obs_info"),
                                    database = "ncbi")

The sequence headers have the format:

[1] "gi|626494018|ref|NR_119774.1| Cortinarius cramesinus PDD 27173 ITS region; from TYPE material"        
[2] "gi|1028916741|ref|NR_137130.1| Lactarius perparvus GENT KW320 ITS region; from TYPE material"         
[3] "gi|1028916731|ref|NR_137120.1| Cortinarius elotoides IB 19870060 ITS region; from TYPE material"      
[4] "gi|1028916725|ref|NR_137114.1| Stomatisora psychotriicola PREM 60886 ITS region; from TYPE material"  
[5] "gi|1028916658|ref|NR_137047.1| Fomitiporia tenuis MUCL 44802 ITS region; from TYPE material"          
[6] "gi|1028916619|ref|NR_137008.1| Derxomyces qinlingensis CGMCC AS 2.2446 ITS region; from TYPE material"

We can plot the result using plot:

set.seed(2)
heat_tree(genbank_ex_data,
          node_size = n_obs,
          node_color = n_obs,
          node_label = name,
          layout = "fruchterman")

UNITE FASTA headers

The format of the UNITE (Kõljalg et al. 2005) FASTA release has two pieces of information from which classifications can be determined. The GenBank sequence identifier can be used to look up the taxon id from GenBank. Alternatively, the classifications specified in the header can be used to make an arbitrarily coded taxonomy.

Using the sequence identifier

The GenBank accession number in the second entry of UNITE sequence headers can be used to look up the taxon assigned to each sequence by GenBank. However, some of the sequences in the UNITE database do not have a GenBank accession number; these IDs start with UDB and should be filtered out:

file_path <- system.file("extdata", "unite_general_release.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
unite_ex_data_1 <- extract_taxonomy(sequences[!grepl(pattern = "\\|UDB", names(sequences))],
                                    regex = "^(.*)\\|(.*)\\|(.*)\\|.*\\|(.*)$",
                                    key = c(seq_name = "obs_info", "obs_id",
                                            other_id = "obs_info", tax_string = "obs_info"),
                                    database = "ncbi")

The sequence headers have the format:

[1] "Lachnum_sp|JQ347180|SH189775.06FU|reps|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnum;s__Lachnum_sp"                              
[2] "Lachnellula_calyciformis|U59145|SH189776.06FU|refs|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnellula;s__Lachnellula_calyciformis"
[3] "Lachnum_sp|AM084756|SH189777.06FU|reps|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnum;s__Lachnum_sp"                              
[4] "Lachnum_sp|FM172814|SH189778.06FU|reps|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnum;s__Lachnum_sp"                              
[5] "Lachnum_sp|FN539058|SH189779.06FU|reps|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnum;s__Lachnum_sp"                              
[6] "Lachnum_pulverulentum|AB481260|SH189780.06FU|refs|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnum;s__Lachnum_pulverulentum"        
set.seed(10)
filter_taxa(unite_ex_data_1, name == "Fungi", subtaxa = TRUE) %>%
  heat_tree(node_size = n_obs,
            node_color = n_obs,
            node_label = name,
            layout = "davidson-harel",
            overlap_avoidance = 0.5)

Using included classification names to look up the taxon id

If you want to use the structure and names of the classification provided in the header, but still look up the official taxon id, you can provide "class" to key and specify a database to look up taxon ids from. Since the UNITE classification also included the rank for each taxon, we will have to specify how to parse each taxon string using the class_regex and class_key options. These work the same as regex and key, except they apply to each element in a classification. You can still capture the sequence id or taxon id (assuming its present in the header) by using "obs_info" or "taxon_info" where you would otherwise use "obs_id" or "taxon_id".

file_path <- system.file("extdata", "unite_general_release.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
unite_ex_data_2 <- extract_taxonomy(sequences,
                                    regex = "^(.*)\\|(.*)\\|(.*)\\|.*\\|(.*)$",
                                    key = c(seq_name = "obs_info", seq_id = "obs_info",
                                            other_id = "obs_info", "class"),
                                    class_regex = "^(.*)__(.*)$",
                                    class_key = c(unite_rank = "taxon_info", "name"),
                                    class_sep = ";",
                                    database = "ncbi")
heat_tree(unite_ex_data_2,
          node_size = n_obs,
          node_color = n_obs,
          node_label = name)

Note that the taxon name (entry 1) and the sequence id (entry 2) are now encoded as "obs_info", causing them to be interpreted as generic data. This means that only the classification string (e.g. k__Fungi;p__Ascomycota;c__Leotiomycetes) will be interpreted as having taxonomic information, but the other information will also be included in the output in columns named name and sequence_id. The unique taxon id for each taxon encountered will be looked up and taxa not found will be encoded as NA.

Using the included classifications to generate arbitrary ids

It is also possible to build a custom taxonomy encoding using the taxonomy in the sequence headers without looking up the official taxon ids of each taxon. Taxa will be assigned arbitrary taxon ids that will be specific to the current analysis. This is prefereable to just using unique taxon names as IDs, since there are often placeholder taxa names such as “unknown” or “spp” and occasionally different taxa can have to same name. This is the method that is most useful if available, since it does not require an internet connection. To do this, leave off the database option from the previous example.

file_path <- system.file("extdata", "unite_general_release.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
unite_ex_data_3 <- extract_taxonomy(sequences,
                                    regex = "^(.*)\\|(.*)\\|(.*)\\|.*\\|(.*)$",
                                    key = c(seq_name = "obs_info", seq_id = "obs_info",
                                            other_id = "obs_info", "class"),
                                    class_regex = "^(.*)__(.*)$",
                                    class_key = c(unite_rank = "taxon_info", "name"),
                                    class_sep = ";")
heat_tree(unite_ex_data_3,
          node_size = n_obs,
          node_color = n_obs,
          node_label = name)

ITS1 DB FASTA headers

The ITS1 database FASTA header includes the GenBank taxon id. In the example below, both the "obs_id" and "taxon_id" keys are provided, but only the "taxon_id" is used to look up taxonomy information since it has precedence.

file_path <- system.file("extdata", "its1_chytridiomycota_hmm.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
its1_ex_data <- extract_taxonomy(sequences,
                                 regex = "^(.*)\\|(.*)\\|tax_id:(.*)\\|(.*)$",
                                 key = c("obs_id", taxon_name = "taxon_info",
                                         "taxon_id", description = "obs_info"),
                                 database = "ncbi")

The sequence headers have the format:

[1] "HQ191393_ITS1_HMM|uncultured Chytridiomycota|tax_id:175247|ITS1 located by HMM profiles,179bp length; "
[2] "HQ191391_ITS1_HMM|uncultured Chytridiomycota|tax_id:175247|ITS1 located by HMM profiles,251bp length; "
[3] "KJ464412_ITS1_HMM|Rhizophydium sphaerotheca|tax_id:109902|ITS1 located by HMM profiles,145bp length; " 
[4] "HQ191291_ITS1_HMM|uncultured Chytridiomycota|tax_id:175247|ITS1 located by HMM profiles,171bp length; "
[5] "HQ191292_ITS1_HMM|uncultured Chytridiomycota|tax_id:175247|ITS1 located by HMM profiles,178bp length; "
[6] "HQ191295_ITS1_HMM|uncultured Chytridiomycota|tax_id:175247|ITS1 located by HMM profiles,177bp length; "
set.seed(1)
heat_tree(its1_ex_data,
          node_size = n_obs,
          node_color = n_obs,
          node_label = name,
          layout = "fruchterman-reingold")

PR2 FASTA headers

The first observation in the PR2 (Guillou et al. 2012) header is sometimes a GenBank accession number, but not always. Therefore, the best option is to use the included taxonomy information.

file_path <- system.file("extdata", "pr2_stramenopiles_gb203.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
pr2_ex_data <- extract_taxonomy(sequences,
                                regex = "^(.*\\..*?)\\|(.*)$",
                                key = c("obs_id", "class"),
                                class_sep = "\\|")

The sequence headers have the format:

[1] "10-044.1.1773|Eukaryota|Stramenopiles|Stramenopiles_X|Oomycota|Oomycota_X|Oomycota_XX|Oomycota_XXX|Oomycota_XXX+sp."                                 
[2] "10-045.1.1772|Eukaryota|Stramenopiles|Stramenopiles_X|Oomycota|Oomycota_X|Oomycota_XX|Oomycota_XXX|Oomycota_XXX+sp."                                 
[3] "10-150.1.1771|Eukaryota|Stramenopiles|Stramenopiles_X|Oomycota|Oomycota_X|Oomycota_XX|Oomycota_XXX|Oomycota_XXX+sp."                                 
[4] "10-374.1.1778|Eukaryota|Stramenopiles|Stramenopiles_X|Oomycota|Oomycota_X|Oomycota_XX|Oomycota_XXX|Oomycota_XXX+sp."                                 
[5] "HQ166719.1.1035_U|Eukaryota|Stramenopiles|Ochrophyta|Bacillariophyta|Bacillariophyta_X|Polar-centric-Mediophyceae|Chaetoceros|Chaetoceros+calcitrans"
[6] "JQ781961.1.1805_U|Eukaryota|Stramenopiles|Stramenopiles_X|MAST|MAST-4|MAST-4A|MAST-4A_X|MAST-4A_X+sp."                                               
heat_tree(pr2_ex_data,
          node_size = n_obs,
          node_color = n_obs,
          node_label = name)

RDP FASTA headers

This RDP (Maidak et al. 2001) FASTA file does not contain any references to taxon or sequence ids that could be used to look up more information. Instead, we will parse the taxonomy information included in the sequence headers. In this case both the rank and taxon name are supplied.

file_path <- system.file("extdata", "rdp_current_Archaea_unaligned.fa", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
rdp_ex_data <- extract_taxonomy(sequences,
                                regex = "^(.*?) (.*)\\tLineage=(.*)",
                                key = c(id = "obs_info", description = "obs_info", "class"),
                                class_regex = "(.+?);(.*?);",
                                class_key = c("name", "taxon_info"))
[1] "S001191995 uncultured archaeon; LCDARCH35\tLineage=Root;rootrank;Archaea;domain;\"Euryarchaeota\";phylum;\"Methanomicrobia\";class;Methanosarcinales;order;untaxmap_Methanosarcinales;"
[2] "S002342293 uncultured archaeon; Arch-2\tLineage=Root;rootrank;Archaea;domain;\"Euryarchaeota\";phylum;untaxmap_\"Euryarchaeota\";"                                                     
[3] "S002876736 uncultured archaeon; AydatHIV_14\tLineage=Root;rootrank;Archaea;domain;\"Thaumarchaeota\";phylum;Nitrosopumilales;order;Nitrosopumilaceae;family;Nitrosopumilus;genus"      
[4] "S002934720 uncultured archaeon; A0423R002_A13\tLineage=Root;rootrank;Archaea;domain;\"Euryarchaeota\";phylum;\"Methanomicrobia\";class;untaxmap_\"Methanomicrobia\";"                  
[5] "S003785628 uncultured archaeon; PC500-1A66\tLineage=Root;rootrank;Archaea;domain;\"Crenarchaeota\";phylum;Thermoprotei;class;untaxmap_Thermoprotei;"                                   
[6] "S000511720 uncultured crenarchaeote; I3K-F4\tLineage=Root;rootrank;Archaea;domain;\"Thaumarchaeota\";phylum;Nitrosopumilales;order;Nitrosopumilaceae;family;Nitrosopumilus;genus"      

Note that the same character ( ; ) is used to delineate different ranks and taxa. This makes it the class_sep option inappropriate. If the class_sep option is not defined, each match of class_regex is considered to be a different taxa. Therefore, we can use a class_regex that matches two ; to correctly parse this awkward format.

set.seed(3)
heat_tree(rdp_ex_data,
          node_size = n_obs,
          node_color = n_obs,
          node_label = name,
          layout = "da")

SILVA FASTA headers

SILVA (Quast et al. 2012) has a relativly simple format with an embedded taxonomy:

file_path <- system.file("extdata", "silva_nr99.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
silva_ex_data <- extract_taxonomy(sequences,
                                  regex = "^(.*?) (.*)$",
                                  key = c(id = "obs_info", "class"),
                                  class_sep = ";")
[1] "AB001438.1.1662 Eukaryota;SAR;Alveolata;Dinoflagellata;Dinophyceae;Gymnodiniphycidae;Kareniaceae;Gyrodinium sp."                                         
[2] "AB006051.1.1796 Eukaryota;Archaeplastida;Chloroplastida;Chlorophyta;Trebouxiophyceae;Chlorellales;Lobosphaera;Lobosphaera tirolensis"                    
[3] "FJ911802.1.1606 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Chelicerata;Arachnida;Acari;Poecilochirus sp. APGD-2010"
[4] "FJ911814.1.1608 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Chelicerata;Arachnida;Acari;Arrenoseius sp. APGD-2010"  
[5] "FJ911819.1.1607 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Chelicerata;Arachnida;Acari;Eviphis sp. 1 APGD-2010"    
[6] "FJ911835.1.1585 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Chelicerata;Arachnida;Acari;Dermanyssus hirundinis"     
set.seed(2)
heat_tree(silva_ex_data,
          node_size = n_obs,
          node_color = n_obs,
          node_label = name,
          tree_label = name)

Parsing raw characters

A list of sequence IDs

If the input is only the data used to look up classifications, without any other text to seperate from it, the regex argument can be omitted. Say someone emails you a list of ncbi ids that you want to plot. You could use the following code to get the taxonomic information.

raw <- "JQ086376.1 AM946981.2 JQ182735.1 CP001396.1 J02459.1 AC150248.3 X64334.1 CP001509.3 CP006698.1 AC198536.1 JF340119.2 KF771025.1 CP007136.1 CP007133.1 U39286.1 CP006584.1 EU421722.1 U03462.1 U03459.1 AC198467.1 V00638.1 CP007394.1 CP007392.1 HG941718.1 HG813083.1 HG813082.1 CP007391.1 HG813084.1 CP002516.1 KF561236.1 JX509734.1 AP010953.1 U39285.1 M15423.1 X98613.1 CP006784.1 CP007393.1 CU928163.2 AP009240.1 CP007025.1 CP006027.1 CP003301.1 CP003289.1 CP000946.1 CP002167.1 HG428755.1 JQ086370.1 CP001846.1 CP001925.1 X99439.1 AP010958.1 CP001368.1 AE014075.1 CP002212.1 CP003034.1 CP000243.1 AY940193.1 CP004009.1 JQ182732.1 U02453.1 AY927771.1 BA000007.2 CP003109.1 CP007390.1 U02426.1 U02425.1 CP006262.1 HG738867.1 U00096.3 FN554766.1 CP001855.1 L19898.1 AE005174.2 FJ188381.1 AK157373.1 JQ182733.1 U39284.1 U37692.1 AF129072.1 FM180568.1 CP001969.1 HE616528.1 CP002729.1 JF974339.1 AB248924.1 AB248923.1 CP002291.1 X98409.1 CU928161.2 CP003297.1 FJ797950.1 CP000038.1 U82598.1 CP002211.1 JQ806764.1 U03463.1 CP001665.1"
ids <- strsplit(raw, " ")[[1]]
contaminants <- extract_taxonomy(ids, key = "obs_id", database = "ncbi")
set.seed(3)
heat_tree(contaminants,
          node_size = n_obs,
          node_color = n_obs,
          node_label = name,
          tree_label = name,
          layout = "fruchterman-reingold")

Scraping taxon names from the web

Any list of taxon names can also be parsed by leaving off the regex option. One way to get those names from a website that does not have an API is to parse the HTML of the website using and html parser and XPATH. Note that this example uses ITIS instead of NCBI to look up classifications for the taxa scraped from The Plant List:

library(XML)
taxon_names <- htmlTreeParse("http://www.theplantlist.org/1.1/browse/B/") %>% 
  xmlRoot() %>%
  getNodeSet("//ul[@id='nametree']/li/a/i") %>% # The string is an XPATH expression
  sapply(xmlValue)
bryophytes_ex_data <- extract_taxonomy(taxon_names, key = "name", database = "itis")
set.seed(2)
heat_tree(bryophytes_ex_data,
          node_size = n_obs,
          node_color = n_obs,
          node_label = name,
          tree_label = name,
          layout = "davidson-harel")

Software and packages used

sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.2 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] metacoder_0.1.2     knitcitations_1.0.7 knitr_1.14         
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.9       formatR_1.4       plyr_1.8.4        bitops_1.0-6     
##  [5] tools_3.3.1       digest_0.6.12     gtable_0.2.0      lubridate_1.6.0  
##  [9] evaluate_0.10     tibble_1.2        nlme_3.1-128      lattice_0.20-34  
## [13] bibtex_0.4.0      igraph_1.0.1      DBI_0.5-1         yaml_2.1.13      
## [17] parallel_3.3.1    RefManageR_0.13.1 dplyr_0.5.0       httr_1.2.1       
## [21] stringr_1.1.0     rprojroot_1.2     R6_2.2.0          XML_3.98-1.4     
## [25] rmarkdown_1.3     RJSONIO_1.3-0     reshape2_1.4.2    ggplot2_2.2.1    
## [29] magrittr_1.5      backports_1.0.5   scales_0.4.1      htmltools_0.3.5  
## [33] assertthat_0.1    ape_4.0           colorspace_1.2-7  labeling_0.3     
## [37] stringi_1.1.2     RCurl_1.95-4.8    lazyeval_0.2.0    munsell_0.4.3

References

Guillou, Laure, Dipankar Bachar, Stéphane Audic, David Bass, Cédric Berney, Lucie Bittner, Christophe Boutte, et al. 2012. “The Protist Ribosomal Reference Database (PR2): A Catalog of Unicellular Eukaryote Small Sub-Unit RRNA Sequences with Curated Taxonomy.” Nucleic Acids Research. Oxford Univ Press, gks1160.

Kõljalg, Urmas, Karl-Henrik Larsson, Kessy Abarenkov, R Henrik Nilsson, Ian J Alexander, Ursula Eberhardt, Susanne Erland, et al. 2005. “UNITE: A Database Providing Web-Based Methods for the Molecular Identification of Ectomycorrhizal Fungi.” New Phytologist 166 (3). Wiley Online Library: 1063–8.

Maidak, Bonnie L, James R Cole, Timothy G Lilburn, Charles T Parker Jr, Paul R Saxman, Ryan J Farris, George M Garrity, Gary J Olsen, Thomas M Schmidt, and James M Tiedje. 2001. “The RDP-II (Ribosomal Database Project).” Nucleic Acids Research 29 (1). Oxford Univ Press: 173–74.

Paradis, Emmanuel, Julien Claude, and Korbinian Strimmer. 2004. “APE: Analyses of Phylogenetics and Evolution in R Language.” Bioinformatics 20 (2). Oxford Univ Press: 289–90.

Quast, Christian, Elmar Pruesse, Pelin Yilmaz, Jan Gerken, Timmy Schweer, Pablo Yarza, Jörg Peplies, and Frank Oliver Glöckner. 2012. “The SILVA Ribosomal RNA Gene Database Project: Improved Data Processing and Web-Based Tools.” Nucleic Acids Research. Oxford Univ Press, gks1219.

Comments